A different view of the logistic regression

We can think of a logistic regression as model in which individuals have a latent trait that determines if they fall in one category or another. For instance, latent scholastic aptitude is a trait that (co-) determines if a student passes a class:

threshold = -2
set_par(cex = 1.5)
curve(dlogis(x),-6,6, xlim = c(-5,5),
      xlab = "latent scholastic aptitude")
x = seq(threshold,5,.1)
shade(dlogis(x)~x,c(threshold,6), col = adjustcolor("green3",.5))
x = seq(-6,threshold,.1)
shade(dlogis(x)~x,c(-6,threshold), col = adjustcolor("red3",.5))
abline(v = threshold, lty = 2, lwd = 2)

In this model, the latent variable is distributed according to the logistic distribution.

We can calculate the log odds of an an outcome as before, this time using probabilities.

\[ \textrm{log odds} = log \left( \frac{P_{fail}}{P_{pass}} \right) = log \left( \frac{P_{fail}}{1-P_{fail}} \right) \] To get the cumulative probability to fail at a certain threshold, we use the cumulative probability function of the logistic distribution, also called the inverse logit.

In R we can use boot::inv.logit or plogis to get the cumulative density function of the logistic distribution.

set_par(cex = 1.5)
curve(plogis(x),-6,6,xlim = c(-5,5),
      xlab = "latent scholastic aptitude",
      ylab = "plogis(x) = inv_logit(x)")
curve(plogis(x),-6,-2,col = "red", add = T, lwd = 1.5)
curve(plogis(x),-2,6,col = "green3", add = T, lwd = 1.5)
lines(c(-6,threshold),rep(plogis(threshold),2), lty = 2, col = "red")
lines(rep(threshold,2),c(0,plogis(threshold)), lty = 2, col = "red")

P_fail = plogis(threshold)
P_pass = 1-P_fail
log_odds = log(P_fail/P_pass)
log_odds
## [1] -2

This is exactly our threshold and explain why the intercept coefficient of an intercept-only logistic regression for data with a fail-rate of 12% will be qlogis(.12) = -2.

set_par(mfrow = c(1,2), cex = 1.5)
curve(plogis(x),-5,5, xlab = "threshold")
arrows(x0 = threshold, y0 = 0, y1 = plogis(threshold), col = "blue", lwd = 2)
arrows(x0 = threshold, x1 = -5.4, y0 = plogis(threshold), col = "blue", lwd = 2)
text(-5.4,plogis(threshold),labels = round(plogis(threshold),2), pos = 2, col = "blue", xpd = TRUE, cex = .75)
curve(qlogis(x),0,1, xlab = "cumulative probability")
arrows(x0 = plogis(threshold), y0 = -5, y1 = threshold, col = "blue", lwd = 2)
arrows(y0 = threshold, x0 = plogis(threshold), x1 = -0.04, col = "blue", lwd = 2)
text(-5.4,plogis(threshold),labels = round(plogis(threshold),1), pos = 2, xpd = TRUE)
text(plogis(threshold),-5.4,labels = round(plogis(threshold),2), pos = 1, col = "blue", xpd = TRUE, cex = .75)

Now let us assume that there is another variable that increases the probability to pass a class, so that also students with lower latent scholastic aptitude of -3 can pass a class (could be class difficulty).

In a logistic regression we estimate the following:

\[ log \left( \frac{P_{fail}}{P_{pass}} \right) = \alpha + \beta \cdot X \] Here \(\small \alpha\) is the intercept and baseline log-odds ratio to pass the test (due to latent scholastic aptitude), \(\small X\) is our additional variable and \(\small \beta\) captures log odds ratio from passing the class due to variable \(\small X\).

Because our baseline threshold / log odds ratio is -2 and the threshold for children with variable X is -3, we calculate

\[ \beta = -2 -(-3) = 1 \]

So if we simulate data according to this data generating process and estimate a logistic regression, we should find that \(\small \alpha = -2\) and \(\small \beta = 1\).

Lets simulate the data:

set.seed(1)
N = 10000
X = rep(c(0,1),N/2) %>% sort()
thresholds = -3 + X
pass = rlogis(N) > thresholds

We estimate the model:

q.fit = quap(
  alist(
    pass ~ dbinom(1,p),
    logit(p) ~ -(a + b*X),
    a ~ dnorm(0,3),
    b ~ dnorm(0,3)
  ),
  data = list(pass = pass, X = X)
)
precis(q.fit) %>% 
  round(2)
##    mean   sd  5.5% 94.5%
## a -2.90 0.06 -3.00 -2.80
## b  0.87 0.08  0.75  0.99

Due to simulation noise we are not exactly recovering the parameters, but we are getting very close to the correct thresholds with -2.9 at baseline and a + b = -2.9 + 0.87 = -2.03 for individuals where X = 1.

This shows us again that regression weights in logistic regressions represent changes in log odds ratios, and thus also changes in the threshold, due to a predictor variable.

Ordinal logistic regression

A simple threshold / intercepts only model

Ordinal logistic regressions are named as such, because just like logistic regressions they assume a latent logistic variable that determines responses. Differently than standard logistic regression, ordered logistic regressions use multiple thresholds, which are used to map the latent variable onto ordered responses:

plot_dlogis.thresh = function(threshs, xlim = c(-5,5), plot.text = TRUE) {
  threshs = as.numeric(threshs)
  x.st = xlim[1]-1
  x.en = xlim[2]+1
  curve(dlogis(x),x.st,x.en,xlim = xlim,
        xlab = "latent scholastic aptitude")
  n_cat = length(threshs) + 1
  clrs =
    colorRampPalette(c("blue","blue4"))(n_cat)
  for (k in 1:n_cat) {
    st = ifelse(k == 1, x.st, threshs[k-1])
    en = ifelse(k == n_cat, x.en, threshs[k])
    x = seq(st,en,length.out = 50)
    polygon(c(min(x),x,max(x)), c(0,dlogis(x),0), col = clrs[k])
    arrows(x0 =  threshs[k], y0 = 0, y1 = dlogis(0), lty = 2, col = "red", lwd = 2,length = 0)
    if (plot.text == TRUE) {
      text((st+en)/2,dlogis(0)/2,paste0("R=",k), col = "grey50", cex = 1.5)
      text(threshs[k],dlogis(0),round(threshs[k],2), col = "red", pos = 2)
    }
  }
}
thresholds = c(-1,.25,2)
n_cat = length(thresholds) + 1
clrs = colorRampPalette(c("blue","blue4"))(n_cat)
set_par(cex = 1.5)
plot_dlogis.thresh(thresholds)

Just as we can use the (cumulative) logistic distribution to recover threshold values (log odds ratios) for the simple logistic regression, we can use it to recover thresholds from an ordered logistic model.

For instance, when 26.6% of individuals respond with R = 1:

log(.268/(1-.268)) %>% 
  round(2)
## [1] -1

As you might have guessed, we can also read the probability of different responses from the cumulative logit distribution:

plot_plogis.thresh = function(thresholds, plot.thrsh.eq = FALSE, xlim = c(-5,5)) {
  x.st = xlim[1]-1
  x.en = xlim[2]+1
  n_cat = length(thresholds) + 1
  clrs = 
    colorRampPalette(c("blue","blue4"))(n_cat) 
  curve(plogis(x),x.st,x.en,xlim = c(-5,5),
        xlab = "latent scholastic aptitude")
  for (k in 1:n_cat) {
    s = ifelse(k == 1, x.st, thresholds[k-1])
    e = ifelse(k == n_cat, x.en, thresholds[k])
    curve(plogis(x), s,e, col = clrs[k], add = T, lwd = 2)
    
    b = ifelse(k == 1, 0, plogis(thresholds[k-1]))
    t = ifelse(k == n_cat, 1, plogis(thresholds[k]))
    x = seq(s,e,.1)
    y.p = c(plogis(x), plogis(max(x)))
    x.p = c(x,x.en)
    polygon(c(x.p,x.en),c(y.p,plogis(x[1])), col = clrs[k], border = "NA")
    arrows(x0 = -4, y1 = t, y0 = b, angle = 90, code = 3, length = .15)
    
    prob.level = 
      ifelse(
        k == 1,
        plogis(thresholds[k]),
        ifelse(
          k == n_cat,
          1-plogis(thresholds[k-1]),
          plogis(thresholds[k])-plogis(thresholds[k-1]))) %>% 
      round(2)
    text(-4, (b+t)/2, paste0("P(R=",k,")=", prob.level), pos = 4)
    
    if (k < n_cat) {
      p=plogis(thresholds[k])
      thrsh = log(p/(1-p)) %>% round(1)
      thrsh.equ = bquote(frac(.(round(p,2)),.(round(1-p,2)))~"=exp("~.(thrsh)~")")
      if (plot.thrsh.eq == TRUE) text(5.5,p,thrsh.equ, xpd = T, pos = 4)
      abline(h = p, lty = 2, col = "red")
    }
  }
  points(thresholds,rep(0,n_cat-1), pch = "|", col = "red")
}
set_par(cex = 1.5, mar = c(3,3,.5,7))
plot_plogis.thresh(thresholds, plot.thrsh.eq = TRUE)

Let’s again simulate data from the model and estimate the thresholds.

Here we simulate data:

N = 1000
R = cut(
  rlogis(1000),
  breaks = c(-Inf,thresholds,Inf)) %>% 
  as.numeric()

which we can show as histogram and as cumulative counts:

set_par(mfrow = c(1,2))
R %>% 
  table() %>% 
  barplot(ylab = "N",
          col = clrs,
          xlab = "Response")

m = diag(4)
m[upper.tri(m)] = 1
m = apply(m,2, function(x) x*R %>% table())
x = barplot(m, col = clrs,
        ylab = "cumulative N",
        xlab = "Response",
        names.arg = 1:n_cat)
ps = c(plogis(thresholds[1]),diff(c(plogis(thresholds),1)))
cs = c(0,plogis(thresholds),1)
for (k in 1:n_cat)
  text(tail(x,1),
       mean(cs[k:(k+1)])*N,
       paste(round(ps[k]*100),"%"),
       col ="white",
       cex = .5)

Here is an ordered logistic model:

bol.model = 
  alist(
    R ~ dordlogit(0,thresholds),
    thresholds ~ dnorm(0,3)
  )

The key different between the simple logistic and the ordered logistic regression is that the former has one threshold parameter:

\[ log \left( \frac{P(fail)}{P(success)} \right) = \alpha_k \]

and the latter has \(k = \small \textrm{1-number of levels}\) threshold parameters:

\[ log \left( \frac{P(R>\alpha_k)}{P(R<\alpha_k)} \right) = \alpha_k \]

The dordlogit calculates the likelihood of the observation given the model using the plogis / inv_logit function, whereby different threshold values lead to different likelihoods:

set_par(mfrow = c(2,2))

thresh_list = list(thresholds,thresholds-1.25)

for (k in 1:2) {
  thres = thresh_list[[k]]
  plot_plogis.thresh(thres)
  rbind(R %>% 
        table %>% 
        prop.table(),
      rlogis(100) %>% 
        cut(c(-Inf,thres,+Inf)) %>% 
        table %>% prop.table()) %>% 
  barplot(beside = T, 
          col = c("grey25","blue"),
          ylab="proportion",
          xlab = "Response")
legend("topleft", fill = c("grey25","blue"), 
       legend = c("data",expression("model | "~theta)), bty = "n")

}

So the basic ordered logistic regression model tries to find the correct threshold values that allow to reproduce the observed distribution of ratings.

We fit the model.

bol.fit = ulam(
  bol.model,
  data=list(R=R),
  chains=4,cores=4)

And here are the estimated thresholds:

precis(bol.fit, depth = 2) %>% plot()
threshold.est = precis(bol.fit, depth = 2)[,1]
text(threshold.est,
     3:1,
     round(threshold.est,2), pos = 3)

As expected we recover the correct threshold values (with some error/bias due to simulation and prior).

Ordinal logistic regression with predictors

To understand how to set up a model, such that a variable can increase or decrease the probability of higher response levels, lets look at the plot of latent variables and thresholds again. The plot also shows a hypothetical person with a latent scholastic aptitude of 0 as a white dot.

set_par(cex = 1.5)
plot_dlogis.thresh(thresholds)
plotrix::draw.circle(0,.005,radius = .1,col = "white")

If we want to increase the chance that this hypothetical person achieves a rating of R = 3, we can either increase the value of the latent variable scholastic aptitude or shift the the thresholds to the left:

set_par(mfrow = c(1,2),cex = 1.25)
thresholds = c(-1,0,2)
plot_dlogis.thresh(thresholds)
arrows(x0 = 0, x1 = 1, y0 = .005, length = .1, col = "white")
plotrix::draw.circle(1,.005, radius = .1,col = "white")
plotrix::draw.circle(0,.005, radius = .1,col = adjustcolor("white",alpha = .75))


thresholds = c(-2,-1,1)
plot_dlogis.thresh(thresholds)
abline(v = thresholds+1, col = adjustcolor("red",alpha = .25), lty = 2)
arrows(x0 = thresholds+1, x1 = thresholds,
       y0 = c(seq(.025,.075,length.out = 3)),
       col = "red", length = .1)
plotrix::draw.circle(0,.005,radius = .1,col = "white")

If we shift the latent value or the thresholds by the same amount, the probability of a higher response rises equally. In both plots above the bright white dot is in in the middle between the 2/3 and 3/4 thresholds.

Therefore we can, as we saw earlier for the logistic regression, add terms for individual level effects to the basic threshold / intercept only model to capture the effect of predictor variables on responses:

\[ log \left( \frac{P_i(R>\alpha_k)}{P_i(R<\alpha_k)} \right) = \alpha_k + \beta \cdot X_i \] where \(\small \alpha_k\) are thresholds and \(\beta\) captures the effect of the variable \(X\) by modifying each individuals threshold.

Now we have seen previously that a threshold is just the log odds of responding in a category above vs below the threshold. Therefore, a positive (negative) regression weight in an ordinal logistic regression should be interpreted as the log odds ratio to give a one level higher (lower) response, i.e. R = 4 instead of R = 3 (R = 3 instead of R = 4).

Because there is only one regression weight \(\small \beta\), the log odds ratios are the same for all category transitions, i.e. 

\[ log \left( \frac{P_i(R>\alpha_1|X = 0)}{P_i(R<\alpha_1|X = 1)} \right) = log \left( \frac{P_i(R>\alpha_2|X = 0)}{P_i(R<\alpha_2|X = 1)} \right) = log \left( \frac{P_i(R>\alpha_3|X = 0)}{P_i(R<\alpha_3|X = 1)} \right) \]

This is referred to as the proportional odds assumptions, which may or may not be true.

To see the ordered logistic regression in action, we’ll use the example data from the Portugues school again. In particular, we are looking at the (coarsened) first grade and estimate again the association with maternal education.

df=read.table("../Chapter11/data/student-mat.csv",sep=";",header=TRUE)
df = df[df$Medu>0,]
set_par()
df$G1 = as.numeric(cut(df$G1, breaks = seq(0,20,2),include.lowest = T))-1
df$G1 %>% table() %>% barplot()

To recapitulate the thresholds are just odds ratios at category boundaries, lets calculate them:

cum_prob = df$G1 %>% 
  table() %>% 
  prop.table() %>% 
  cumsum()
cum_prob = cum_prob[cum_prob!=1]
simple_thresholds = log(cum_prob/(1-cum_prob))
simple_thresholds %>% round(2)
##     1     2     3     4     5     6     7     8 
## -5.27 -2.39 -0.94 -0.04  0.75  1.63  2.98  4.86

And we estimate a thresholds only model with ulam:

tm.fit = ulam(
  alist(
    G1 ~ dordlogit(0,thresholds),
    thresholds ~ dnorm(0,3)
  ),
  data=list(G1 = df$G1),
  chains=4,cores=4)

And we plot the simple thresholds calculated manually against those estimated with ulam:

post = extract.samples(tm.fit)
ulam_thresholds = colMeans(post$thresholds)
CI = apply(post$thresholds, 2, PI)
set_par(cex=1.5)
plot(simple_thresholds,
     ulam_thresholds,
     ylim = range(CI),
     pch = 16, col = "blue")
arrows(x0 = simple_thresholds,
       y0 = CI[1,], y1 = CI[2,],
       col = "blue", length = 0)
abline(0,1,lty = 3, col = "grey")

Now that we have understood thresholds, we can implement a model that uses predictors. Specifically, we estimate the effect of past failure and maternal education on grades.

ol.model = 
  alist(
    G1 ~ dordlogit(phi,thresholds),
    phi <- bM*Medu + iFE*failures,
    iFE <- bF + bFE*Medu,
    c(bF, bM, bFE) ~ normal(0,1),
    thresholds ~ dnorm(0,3)
  )

We fit the model.

fn = "ol_fit.Rdata"
if (file.exists(fn)) {
  load(fn)
} else {
  ol.fit = ulam(
  ol.model,
  data=list(G1 = df$G1, Medu = df$Medu, failures = df$failures),
  chains=4,cores=4)
  save(ol.fit,file = fn)
}
ol.fit = ulam(
  ol.model,
  data=list(G1 = df$G1, Medu = df$Medu, failures = df$failures),
  chains=4,cores=4)

We first just check Rhat values to make sure the model converged:

precis(ol.fit, depth = 2) %>% round(3)
##                 mean    sd   5.5%  94.5%    n_eff Rhat4
## bFE           -0.267 0.122 -0.459 -0.069 1527.724 1.002
## bM             0.300 0.091  0.150  0.449 2085.768 1.000
## bF            -0.306 0.289 -0.764  0.160 1473.478 1.002
## thresholds[1] -5.223 0.730 -6.456 -4.141 1582.945 1.001
## thresholds[2] -2.145 0.322 -2.660 -1.645 1950.518 0.999
## thresholds[3] -0.481 0.283 -0.947 -0.042 2282.431 0.999
## thresholds[4]  0.533 0.281  0.075  0.974 2191.672 0.999
## thresholds[5]  1.391 0.286  0.921  1.850 2195.017 0.999
## thresholds[6]  2.332 0.306  1.836  2.817 2202.651 1.000
## thresholds[7]  3.702 0.358  3.141  4.261 2314.474 1.000
## thresholds[8]  5.615 0.617  4.712  6.694 2244.717 1.000

This looks OK.

As a quick posterior predictive check we compare the histogram of the observed and predicted grades. We use the sim function instead of the link function to get posterior predictions on the scale of the ordered outcome variable.

set_par()

pp = sim(ol.fit)
obs.counts = cut(df$G1, breaks = seq(.5,10.5,1)) %>% table()
plot(obs.counts, 's', xaxt = "n", ylim = c(0,110)) 
axis(1,at = (1:9)+.5, labels = 1:9, lwd = 2)
for (k in 1:250) {
  pp.counts = cut(pp[k,], breaks = seq(.5,10.5,1)) %>% table()
  lines(1:10, pp.counts, 's', col = adjustcolor("blue",".25"))
}

Lets quickly compare the thresholds with thresholds from a thresholds only model again:

post = extract.samples(ol.fit)
ulam_thresholds = colMeans(post$thresholds)
CI = apply(post$thresholds, 2, PI)
set_par(cex=1.5)
plot(simple_thresholds,
     ulam_thresholds,
     ylim = range(CI),
     pch = 16, col = "blue")
arrows(x0 = simple_thresholds,
       y0 = CI[1,], y1 = CI[2,],
       col = "blue", length = 0)
abline(0,1,lty = 3, col = "grey")

Because the model now also estimates effects of education and past failures, the thresholds are different. Only if the the covariates would be independent of grades would we expect to see identical thresholds.

Now lets look at the coefficients of interest:

coeffs = precis(ol.fit, pars = c("bM","bF","bFE"))
coeffs %>% plot()

coeffs %>% round(2)
##      mean   sd  5.5% 94.5%   n_eff Rhat4
## bM   0.30 0.09  0.15  0.45 2085.77     1
## bF  -0.31 0.29 -0.76  0.16 1473.48     1
## bFE -0.27 0.12 -0.46 -0.07 1527.72     1

What do these results mean?

  • for each level of maternal education the odds ratio to get a one level higher grade is exp(0.3) = 1.35
  • for each paste fail the odds ratio to get a one level higher grade is exp(-0.31) = 0.74, though we are uncertain that the odds ratio is less than 1.
  • the “effect” of past fails depends on maternal education, with a log odd ratio of -0.27

It is a bit hard to interpret these results because they are on the log odds ratio scale and because it is not totally clear (to me) how to interpret interaction effects. Therefor, we use posterior predictions to understand the results better.

First, we create posterior predictions for all levels of maternal education and past fails:

sim.data = expand.grid(
  Medu = sort(unique(df$Medu)),
  failures = sort(unique(df$failures)))

pp = sim(ol.fit,data = sim.data)

Here is a general overview:

mu = colMeans(pp)
CI = apply(pp,2,PI)
set_par(cex = 1.5)
plot(0,type = "n", xlim = c(0.5,4.5),
     ylim = range(CI), xaxt = "n",
     xlab = "previous fails",
     ylab = "grade")
axis(1,at = 1:4)

for (Medu in 1:4) {
  idx = sim.data$Medu == Medu
  x = (1:4)+(Medu-2.5)/6
  points(x,mu[idx], pch = 16, col = Medu)
  arrows(x0 = x, y0 = CI[1,idx], y1 = CI[2,idx], length = 0, col = Medu)
}
legend("topright", pch = 16, col = 1:4, legend = 1:4, title = "Medu", bty = "n")

This plot suggests several questions:

  • Do grades decrease from 0 to 3 previous fails?
  • Is higher maternal education associated with higher grades when there were no previous fails and with lower grades when there were 3-4 fails?
  • are the above mentioned slopes robustly different?

Lest first look at the data for which we simulated outcomes:

sim.data
##    Medu failures
## 1     1        0
## 2     2        0
## 3     3        0
## 4     4        0
## 5     1        1
## 6     2        1
## 7     3        1
## 8     4        1
## 9     1        2
## 10    2        2
## 11    3        2
## 12    4        2
## 13    1        3
## 14    2        3
## 15    3        3
## 16    4        3
phi = link(ol.fit,data = sim.data)$phi
thresholds = extract_post_ulam(ol.fit)$thresholds
pp = matrix(NA, nrow = nrow(thresholds), ncol = nrow(sim.data))
for (k in 1:nrow(sim.data)) {
  log_odds_ratio = apply(thresholds,2, function(x) x + phi[,k])
  cum_prob = apply(log_odds_ratio,1, function(x) plogis(x)) %>% t()
  cum_prob = cbind(cum_prob,1-cum_prob[,ncol(cum_prob)])
  pp[,k] = apply(cum_prob,1, function(x) mean(x*(1:9))) # expected grade
}

To answer the the first question, Do grades decrease from 0 to 3 previous fails?, we calculate:

\[ \delta = \frac{1}{3}\sum^i_{1:3} G(fails = i) - G1(fails = i-1) \]

posterior_hist = function(x,label, set_par = TRUE, xlim = NULL) {
  if (set_par == T) set_par(cex = 1.5, mar=c(3,3,3,.5))
  hist(x, ylab = label, xlim = xlim, xlab = label,
       main = paste0(round(mean(x),2),", (",paste(round(PI(x),2), collapse = ", "),")"))
  abline(v = 0, col = "red", lty = 3)
  abline(v = (PI(x)), col = "blue", lwd = 2)
}
delta = rep(0,nrow(pp))
for (i in 1:3) {
  delta = delta +
    pp[,which(sim.data$failures == i)] - 
    pp[,which(sim.data$failures == i-1)]
}
delta = delta / 3
posterior_hist(delta, "delta fails", xlim = range(delta))

To answer the the first question, Is higher maternal education associated with higher grades when there were no previous fails and with lower grades when there were 3-4 fails?, we first calculate the effect of maternal education in students with 0 and 2 or 3 precious fails:

# calculate contrasts
delta_F0 = rep(0,nrow(pp))
delta_F23 = rep(0,nrow(pp))
for (i in 2:4) {
  delta_F0 = delta_F0 +
    pp[,which(sim.data$Medu == i & sim.data$failures == 0)] - 
    pp[,which(sim.data$Medu == i-1 & sim.data$failures == 0)]
  delta_F23 = delta_F23 +
    rowMeans(pp[,which(sim.data$Medu == i & sim.data$failures > 1)]) - 
    rowMeans(pp[,which(sim.data$Medu == i-1 & sim.data$failures > 1)])
}
delta_F0 = delta_F0 / 3
delta_F23 = delta_F23 / 3

# plot posterior distribution
set_par(mfrow = c(1,2), cex = 1.25, mar=c(3,3,3,.5))
posterior_hist(
  delta_F0,
  "delta(fails=0)",
  set_par = FALSE,
  xlim = c(-.7,.3))
posterior_hist(
  delta_F23,
  "delta(fails>1)",
  set_par = FALSE,
  xlim = c(-.7,.3))

We see that whereas higher maternal education is associated with better grades when the student had not failed the class yet, there is weak evidence for the oposite effec for students who failed the class 2-3 already. To see if the two effects are really different, we need to compare them explicielty, which we do next:

And next we can calculate the difference between these two delta:

interaction_eff = delta_F0-delta_F23
posterior_hist(interaction_eff, "delta(fails=0)-delta(fails=0)", xlim = range(interaction_eff))

The slopes are indeed clearly different.

Summary